A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions

نویسندگان

  • Xiang Gao
  • Huaiying Lin
  • Qunfeng Dong
چکیده

Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling

This note shows how to integrate out the multinomial parameters for latent Dirichlet allocation (LDA) and naive Bayes (NB) models. This allows us to perform Gibbs sampling without taking multinomial parameter samples. Although the conjugacy of the Dirichlet priors makes sampling the multinomial parameters relatively straightforward, sampling on a topic-by-topic basis provides two advantages. Fi...

متن کامل

Testing Significance in Bayesian Classifiers

The Fully Bayesian Significance Test (FBST) is a coherent Bayesian significance test for sharp hypotheses. This paper explores the FBST as a model selection tool for general mixture models, and gives some computational experiments for Multinomial-Dirichlet-Normal-Wishart models.

متن کامل

Properties of Bayes Factors Based on Test Statistics

This article examines the consistency, interpretation and application of Bayes factors constructed from standard test statistics. Primary conclusions are that Bayes factors based on multinomial and normal test statistics are consistent for suitable choices of the hyperparameters used to specify alternative hypotheses, and that such constructions can be extended to obtain consistent Bayes factor...

متن کامل

Functionality classification filter for websites

The objective of this thesis is to evaluate different models and methods for website classification. The websites are classified based on their functionality, in this case specifically whether they are forums, news sites or blogs. The analysis aims at solving a search engine problem, which means that it is interesting to know from which categories in a information search the results come. The d...

متن کامل

Stochastic Discriminative EM

Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the natural exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2017